智能论文笔记

Social-Inverse: Inverse Decision-making of Social Contagion Management with Task Migrations

Guangmo Tong

分类：机器学习

2022-09-21

考虑两个决策任务$ a $和$ b $，每个都希望计算给定\ textit {query} $ x $的有效\ textit {dekistion} $ y $，{我们可以解决任务$ b $通过使用$ a $的查询决定对$（x，y）$，而不知道潜在的决策模型吗？}此类问题，称为\ textit {用任务迁移的倒数决策}，很感兴趣现实世界应用的随机性通常会阻止代理完全了解基础系统。在本文中，我们引入了正式配方的新问题，并提出了一个通用框架，用于解决社会传染管理中的决策任务。在理论方面，我们提出了一个概括分析，以证明我们的框架学习绩效。在经验研究中，我们进行理智检查，并将提出的方法与其他可能的基于学习的方法和基于图的方法进行比较。我们已经获得了有希望的实验结果，首次确认可以通过使用与另一个相关的解决方案来解决一项决策任务。

translated by 谷歌翻译

Learning the Propagation of Worms in Wireless Sensor Networks

Yifan Wang , Siqi Wang , Guangmo Tong

分类：机器学习

2022-09-20

无线传感器网络（WSN）由空间分布式传感器组成，被认为容易受到蠕虫及其变体的攻击。由于蠕虫传播的独特策略，动态行为因传感器的不同特征而异。建模蠕虫的传播可以帮助我们了解蠕虫攻击行为并分析传播程序。在本文中，我们在各种蠕虫下设计了一个通信模型。我们旨在学习我们提出的模型，以分析得出竞争性蠕虫传播的动态。我们开发了一个新的搜索空间与复杂的神经网络模型相结合。此外，实验结果验证了我们的分析，并证明了我们提出的学习算法的性能。

translated by 谷歌翻译

Deep-Steiner: Learning to Solve the Euclidean Steiner Tree Problem

Siqi Wang , Yifan Wang , Guangmo Tong

分类：机器学习

2022-09-20

Euclidean Steiner树问题寻求Min-Cost网络来连接目标位置的集合，并且是无线网络的许多应用程序的基础。在本文中，我们介绍了一项研究，以通过图表学习增强的增强学习来解决欧几里得施泰纳树问题。与搜索空间有限的常见连接问题（例如旅行推销员问题或车辆路由问题）不同，欧几里得施泰纳树问题需要在整个欧几里得空间上进行搜索，从而使现有方法不适用。在本文中，我们通过利用Steiner树的独特特征来设计离散方法，并提出了新的训练方案来处理增量结构期间出现的动态施泰纳点。我们的设计通过在数据集集合上进行的实验进行理智检查，令人鼓舞的结果证明了我们方法的实用性，可作为经典组合方法的替代方法。

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

Xiangtai Li , Shilin Xu , Yibo Yang , Haobo Yuan , Guangliang Cheng , Yunhai Tong , Zhouchen Lin , Dacheng Tao

分类：计算机视觉

2023-01-03

Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.

translated by 谷歌翻译

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

Jianzong Wu , Xiangtai Li , Henghui Ding , Xia Li , Guangliang Cheng , Yunhai Tong , Chen Change Loy

分类：计算机视觉

2023-01-02

In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.

translated by 谷歌翻译

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

Huaizheng Zhang , Yuanming Li , Wencong Xiao , Yizheng Huang , Xing Di , Jianxiong Yin , Simon See , Yong Luo , Chiew Tong Lau , Yang You

分类：机器学习

2023-01-01

New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.

translated by 谷歌翻译

Ponder: Point Cloud Pre-training via Neural Rendering

Di Huang , Sida Peng , Tong He , Xiaowei Zhou , Wanli Ouyang

分类：计算机视觉

2022-12-31

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. Motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images, we train a point-cloud encoder within a devised point-based neural renderer by comparing the rendered images with real images on massive RGB-D data. The learned point-cloud encoder can be easily integrated into various downstream tasks, including not only high-level tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image synthesis. Extensive experiments on various tasks demonstrate the superiority of our approach compared to existing pre-training methods.

translated by 谷歌翻译

Label-Efficient Interactive Time-Series Anomaly Detection

Hong Guo , Yujing Wang , Jieyu Zhang , Zhengjie Lin , Yunhai Tong , Lei Yang , Luoxing Xiong , Congrui Huang

分类：机器学习 | 人工智能

2022-12-30

Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.

translated by 谷歌翻译

Large-scale single-photon imaging

Liheng Bian , Haoze Song , Lintao Peng , Xuyang Chang , Xi Yang , Roarke Horstmeyer , Lin Ye , Tong Qin , Dezhi Zheng , Jun Zhang

分类：计算机视觉

2022-12-28

Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging over an order of magnitude, with significant enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 $\times$ 32 pixels, 90 scenes, 10 different bit depth, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this real-world physical noise model, we for the first time synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depth, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique on a series of experiments including macroscopic and microscopic imaging, microfluidic inspection, and Fourier ptychography. The experiments validate the technique's state-of-the-art super-resolution SPAD imaging performance, with more than 5 dB superiority on PSNR compared to the existing methods.

translated by 谷歌翻译